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Abstract 

This paper introduces a family of recursively defined estimators of the 
parameters of a diffusion process. We use ideas of stochastic algorithms 
for the construction of the estimators. Asymptotic consistency of these 
estimators and asymptotic normality of an appropriate normalization are 
proved. The results are applied to two examples from the financial liter- 
ature; viz., Cox-Ingersoll-Ross' model and the constant elasticity of vari- 
ance (CEV) process illustrate the use of the technique proposed herein. 
AMS Subj. Classification: 62M05, 62P05. 
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1 Introduction 

In this paper we introduce a family of recursively defined estimators of the 
parameters of a diffusion process. We assume that the process is observed when 
it reaches some trigger values in a particular order. An extensive literature exists 
on estimation for a continuous time record of observations of diffusion processes 
(e.g. Bawasa an Rao [5]). Nelson jSU] studies the convergence of stochastic 
difference equations to stochastic differential equations as the length of discrete 
time intervals between observations goes to zero. Banon 0] proposes a recursive 
kernel estimate of an initial density for a stationary Markov process. 

The techniques for the use of discretely-observed data are somewhat differ- 
ent from those used for a continuous time record of observations. Maximum 
likelihood estimation can be applied to discrete data, although most of the cur- 
rent theory requires the discretely sample data to be stationary ergodic Markov 
chains. See Billingsley [HI or Hall and Heyde [301 f° r more complete references. 
Lo |45| derived a functional partial differential equation that characterizes the 
likelihood function of a discretely sampled Ito process. Likelihood based es- 
timation is usually computationally quite costly because an auxiliary partial 
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differential equation must be solved numerically for each hypothetical param- 
eter value and each observed state. Duffie and Singleton J5J and Gourieroux 
et al [2H] suggested the use of numerical methods to approximate moments. He 
|34j proposed the use of binomial approximations. Simulation approaches do not 
require the Markov state vector to be fully observed. However, it is difficult to 
determine the magnitude of the approximation error, and in some applications 
it might be numerically costly to ensure that the approximation error is small. 
Hansen and Scheinkman |.'i2| proposed moment conditions suitable for use of 
generalized method of moment estimators (see Hansen based on properties 
of the infinitesimal generators of stationary Markov processes. A'it-Sahalia 1 
proposed the use of non-parametric techniques for the estimation of stationary 
one-dimensional diffusions. Duffie and Glynn |16| introduced a family of gen- 
eralized method of moments estimators for continuous time Markov processes 
observed at random time intervals. They assume that the arrival of the data 
has an intensity that varies with the underlying Markov process or varies with 
an independent Markov process. An incomplete list of alternative estimation 
procedures includes Ait-Sahalia |2], Gallant, and Tauchen [23], Stanton [51] . 
Bandi and Phillips [3J, Chacko and Viceira [TI], Singleton [53], Eraker [20] and 
Jones [37], 

Diffusion processes play a fundamental role in stochastic optimal control the- 
ory, stochastic thermodynamics, and financial economics. We are particularly 
interested in models arising from the financial literature. Examples of these are 
exchange rate models (e.g. see Froot and Obstfeld [22] and Krugman 0U|) and 
models of term structure of interest rates (e.g. see Cox et al ^Hj and Heath et 
al IS5J). 

Our goal is to estimate the parameters of a diffusion process. We obtain 
results when the state space is one dimensional. We assume that the differential 
operator of the diffusion process (X t ,!Ft, P x ) is given by 

where b, a 2 satisfy some technical conditions sufficient for a diffusion with this 
differential operator to exist, and 9* G K s is a parameter to be estimated. For 
any twice continuous differentiable function / on R, it is known that 

Lf[x)= Um )-/(,) 

where the limit is taken over open sets U containing x and tjj denotes the first 
exit time from the open set U . For the precise meaning of equation Q see 
Dynkin [19j . Therefore, it is natural to use moment conditions based on the 
expressions in the numerator and the denominator of equation ^ to construct 
estimators of the parameters of the diffusion, ft turns out that this approach 
suggests parameterizations that are appropriate for identification of the process 
from moment conditions of the previous type in a way that is made precise in 
Section [3J We use ideas from stochastic algorithms for the construction of the 
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estimators. References for the theory of stochastic algorithms are, for instance, 
Benveniste et al [7], Kushner and Clark 021 > DufLo ^7J, Kushner and Yin |44|. 
Has'Minskii and Nevel'Son [331 ■ We prove that the sequence of estimators con- 
structed is asymptotically consistent and an appropriate normalization of them 
is asymptotically normal. 

Among the advantages of the technique that we propose are that we do not 
require the diffusion process to be stationary, to have an invariant probability 
measure or to satisfy some sort of ergodicity as the techniques in prior works 
assume. Another nice feature of the estimation that we propose is the compu- 
tational tractability for any diffusion with continuous diffcrentiable drifts and 
diffusion coefficients. In fact, we give a closed form for the functions we are 
required to compute. Finally, there exists an extensive literature developed for 
the theory of stochastic algorithms, and so it is likely that the ideas used there 
might be applied to this context. A particularly appealing characteristic of 
stochastic algorithms in the econometrics of financial time series, as Benveniste 
et al jS] recall, is its "generally recognized ability to adapt to variations in the 
underlying systems" . The latter could make it useful for the analysis of high 
frequency data that seems not to be time homogeneous. 

The paper is organized as follows: In Section|21we state some hypotheses that 
are used subsequently and define the estimators that we propose. In SectionUJwc 
prove the asymptotic consistency of these estimators. In Section^Jwe prove that 
an appropriate normalization of the sequence of estimators defined in Section [21 
is asymptotically normal. In Section [5] we show how the theory we develop can 
be applied to some models of interest rates. Namely, we consider Cox-Ingersoll- 
Ross' model and the constant elasticity of variance (CEV) process. 

2 Construction of Estimators 

Our goal in this paper is to estimate some parameters of a Markov process us- 
ing the values of the process that are known at some random times ri,T2,T3, ... 
and i/2,^3,... that are related to the process through the equations © and 
JHJ. Genon-Catalot .Cenon-Catalot and Laredo [2H], Genon-Catalot and Laredo 
[25). Genon-Catalot et al [27J> have constructed estimators for the parameters 
of a diffusion, when only first hitting times are observed. The use of a space 
discretization rather than time discretization is well known in the probabilistic 
context; it has been applied in algorithms of path reconstruction. See Kushner 
and Dupuis g2| , Milstein gS| and Milstein and Tretyakov gU] . 

We assume that we have a parametric set of diffusions indexed by H C R s , 
where H is either a compact set or H = I s . We define random variables re- 
cursively (see equation and equation COJ) that depend on the data which 
we observe. Then we prove that, under some technical conditions, the random 
variables defined in this way are asymptotically consistent for the true value 
of the parameter when the parameter space is one-dimensional. When the pa- 
rameter space is multidimensional we obtain convergence to an invariant set 
of an ODE (Ordinary Differential Equation). From now on we assume that 
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£1 = C([0,oo)) is the canonical space of continuous S-valued functions where 
S = R, S = [0,oo), or S = (0,oo) with the metrizable topology of uniform 
convergence on compact sets. We denote by the Borel er-algebra of f2. For 
< t < 00, X t is the coordinate mapping process and, .7^ the a- algebra gener- 
ated by X(-) on [0,t]; namely Tt = {X s ,0 < s < t}. We define the filtration 
T t + = f} e>0 <r (X u : e + t > u), for t G [0, 00). For the construction of our 
estimators we need the following condition: 

Condition 1. (X t , TtiP^seM *s a parametric set of diffusions with sample 
space (flj-Foo) and differential operators (Lg)g^m- For each 9 £ H C R s , the 
part of the process {X tl JFf,P^) on S is a recurrent strong Markov process. Also 

sup E* r (a ^ < 00 (a,b)cS 

x£ (a, b) 

where r^ a ' b ^ is the first exit time of the open set (a, 6). We assume that the 

functions 8 i— > E^Z and 6 1— > are Borel measurable for all x E S , Z a 
random variable defined on Cl, and f E C 2 (S). 

See Dynkin |191 ITS) for a definition of recurrence, and of part of a process. 
In order to guarantee that a given non-negative second order differential oper- 
ator L defined on C 2 (R) is the differential operator of a diffusion process, it is 
customary to impose: 

Condition 2. L is a non-negative second order differential operator with mea- 
surable drift coefficient b and continuous diffusion coefficient a 2 that is uni- 
formly Lipschitz continuous and satisfies either of the following two properties: 

1. There exists c > such that 

a 1 (x) > c for i€l 

2. a 2 is a twice continuously differ entiable function such that the second 
derivative d 2 a 2 /dx 2 is bounded on R. 

If L is as in Condition [21 then there exists a diffusion process (X t ,J-t,P x ) 
whose differential operator is L. (See Kunita Corollary 4.2.7.) 

Throughout the rest of the paper we shall assume that (X t , Tt, P^eeH is 
a parametric set of one-dimensional diffusions with sample space (VL,Too) that 
satisfies Condition ^ an d with differential operators {Lg)g^ that satisfy Con- 
dition |21 We assume 9* E M is a fixed constant and /i is a probability measure 
supported on S. We denote by (X t ,Tt,P) the Markov process with initial 
probability measure /i and probability for paths starting at x E S, ; Namely 
P = JP°'d (i (x). 

We assume that we have a finite set D = {di, . . . , d s } C S where d\ < ■ ■ ■ < 
d s . D is a set of states where the process can be observed. We assume that the 
data arrival process is given by the following sequence of (T t +) stopping times: 

7? = inf{< > I X t e D} (3) 
r° +1 = inf {t >T n \X t ED\ {X T J} for n > 1 
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We suppress D in what follows. Under the hypothesis of recurrence we observe 
that (r n ) is a sequence of finite stopping times. 

Let {Ud}dev be a finite set of disjoint open connected sets of S such that 
Ud fl D = {d} for any d e D. The boundary points of [7^ comprise a set of 
states where the process can be observed given that the process has reached 
the point d G D. Let D r ,D;: D ^ S U {oo,— 00} be the functions satisfying 
U d = (Di(d),D r (d)). We define ^ : H x D i-> R by the formula 

r,He,x)=-E e x f(X T ^ )ATai ^) forxeD (4) 

where / : S 1— » R is a twice continuous differentiable function. Let 

u n+1 = inf{t > r„ I X t £ C/ Xt „ } for n > 1 (5) 

We observe that (v n ) n >2 is a sequence of {T t +) stopping times. We define V* 
by the formula: 

Vf(6,x,y)=f(y)-r,f(6,x) (6) 

From now on we shall assume that the set of parameters H = R or H is the 
constrain set of parameters H = {9 : a, < 9 l < 6^}, —00 < a, < 9 l < b{ < 00 
for 1 < i < s where 9 l denotes the ith component of 9. It is customary in 
the theory of stochastic algorithms to consider a parameter set that is assumed 
to be compact, due to the fact that useful parameter values in applications 
are confined by constrains of physics or economics to some constrain set. The 
constrain set mentioned above is one of such possibilities. Other alternatives 
can be consider. See Kushner and Yin 01] or the discussion on Section [3] If 
H : H x D — > R s is a measurable map, we define a sequence of estimators by the 
recursive relation 

e n+1 =n H [e n - ln H(e n ,x Tn )v f (Q n ,x Tn ,x Vn+1 )] (7) 

for n > 1, where 0i is a bounded random variable taking values in H, IL3 is the 
projection onto H, and (j n ) is a decreasing sequence of positive numbers with 
7„ i 0. In particular, 0i can be a constant and IL3 = id when H = R. The 
meaning of equation is well known in the theory of stochastic algorithms. 
A noise corrupted observation Y n — H(Q n , X Tn )V^ (Q n , X Tn , X„ n+1 ) of a vector 
valued function g(-) is taken, whose root 9* S EI we are seeking. Actually, one 
observes values of the form Y n — g(9 n , X Tn ) + 5M n where SM n has the property 
that E [SM n I Yi,5Mi,i < n] — 0. Loosely speaking, Y n is an "estimator" of 
g(-) in the sense that g{9) — lim m (l/m) JDi^i 9(@i ^t„) where g(-) is a function 
based on moment conditions of the type defined by equation Q . The sequence 
(7„) is chosen to provide an implicit average of the iterates. 

In a similar way if g : [0, 00) 1— > R is a measurable map, we define rj 9 : M x D 
R, V 9 : H x D x [0, 00) n 1 by the formulas: 

rf{9,x) =E^(r Dr(x) Ar Di(x) ) (8) 
V 9 (6,x,y)=g(y)-r) 9 (6,x) (9) 
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where we assume that {Ud}den is chosen in such a way that rj 9 < oo. For 
instance, under the assumptions of Condition it is enough to assume that the 
sets {Ud]deD have compact closure. As before, if H : H x D — > W is a measurable 
map, we define a sequence (0„) of estimators by the recursive relation 



0„+i — IIb 



9„ - ln H(O n , X Tn )V 3 (Q n , X Tn ,is n+1 - r n ) (10) 



for n > 1, where 0i is a bounded random variable taking values in H. Remarks 
similar to the ones done for the meaning of equation (JJJ hold for equation (|1(J|) . 

Instead, of using stochastic algorithms we could estimate the true parameter 
of the process by trying to minimize the sum of the squares 

n n 

Qn(0) = £ [f(x Vn+1 ) - E e [(f(x Vn+1 ) | T Tn ]] 2 = £ [f{x Vn+1 ) - v f (e,x T j} 2 

k=l k=l 

with respect to 0. When the parameter space is unconstrained, the estimates 
will be taken to be the solution of the system 

The "conditional square" approach goes back to Klimko and Nelson j^Hl among 
others. See also Hall and Heyde (3U]. The methodological reason for which we 
choose to work with stochastic algorithms instead, it is its recognized ability to 
adapt to variations of the underlying system, as well as its ability to process data 
sequentially as they are observed. In the next section we will find conditions 
that are sufficient for the sequence of random variables defined by equations l|7)l 
and (110ft to be asymptotically consistent. 



3 Consistent Estimation 



The next theorem is used to prove asymptotic consistency, in the case of a one- 
dimensional parameterization, for the sequence of random variables defined in 
equations J7J and l|10|) . Compare with Theorem 7.1 from Kushner and Yin |44j . 



Theorem 1 (A Robbins-Monro algorithm). Let P) be a probability 

space, {T n ) be a filtration of sub-a- algebras of T , D C K be a finite set, and 
(A n , Y n , pi be a sequence of real valued (T n ) adapted random variables 

where X n takes values in D. Let O n be defined by the following recursive relation: 

9 n +i =O n - ~/ n H(e n ,Y n )V(G n ,Y n ,X n+1 ) (11) 

where V: IxD xlwR, F: I xDw {1,-1} are measurable functions, (j n ) 
is a decreasing sequence of positive numbers and E(||6*i|| 2 ) < oo . We assume 
that the following hypotheses A\, A<i, Hi, Hi, and Hi are satisfied: 
Ai There exist a measurable function F:MxDnM such that 

E(V(O n ,Y n , X n+ i) | T n ) = V(O n ,Y n ) (12) 
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A2 There exist a positive measurable function S 2 : K x D 1— * [0, 00) such that 

V(V 2 {O n ,Y ni X n+1 ) I T n ) = S 2 {O n ,Y n ) (13) 

Hi There exist 9* G R such that for any d G D and e 1 

(6» - 9*)H(9,d)V{9,d) > (14) 

and there exists an increasing sequence of positive integers (rik)keN such that 
for any e > 

liminf B< va£ gm E((6 - 9*)H(9,Y nk )V(9,Y nk )) > (15) 

H2 There exist K > such that 

S 2 (9,d) <K{\ + {9-9*) 2 ) for all 9 el, deD (16) 
H3 The sequence (7„) of positive numbers satisfies 

^ = °°' T« < 00 ( 17 ) 

n fc n 

TTien i/ie sequence (0 n ) converges almost surely to 9*. 

Typically, a family (X t , T u V { t M) ) (Ai,A 2 )eHixH 2 01 scalar diffusions with 
differential operators (i(A 1 ,A 2 ))(A 1 ,A 2 )eH 1 xH 2 is given, where Hi C R Si , i = 1,2, 
are compact subsets or Hi x H 2 — K Sl x K S2 , and 

Id 2 d 

L (MM) = 2 CT ( X > A i)^2 + & C&, A 2)^ 

It turns out that the sampling structures hinted by equations (0) and flH|l 
suggest that it is more natural to assume a parameterization defined by the 
indexed family of differential operators 

hK,K) = \ a2 ^ X '^ + (VOfo A 2 )a 2 (x, A'i)A (is) 

where G H^ are compact subsets of W* , i = 1,2 or H' x x H' 2 = M Sl x W 2 
and where a 2 , 6/cr 2 are parameterizations of the diffusion and the ratio between 
the drift and the diffusion respectively. See equations (|60fl and (|62|) . It is often 
the case that the latter parameterization defines an equivalent problem to the 
former parameterization, at least as estimation is concerned. Indeed, according 
to ltd and McKean jSH] that borrows a phrase of W. Feller, the expression in 
the numerator of equation (JIJ defines a road map, i.e. it tells what routes the 
particle is permitted to travel, and the expression at the bottom of equation |T|I 
defines the speed of the diffusion. Using Feller's terminology, X' 2 identifies the 
"road map" , and A'i identifies the "speed" of the diffusion when the "road map" 
is known. In this paper we should adopt the latter approach. Corollary ^ 
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below is used to estimate the parameter that identifies the ratio between the 
drift and the diffusion when the parameter space used to identify this ratio is 
R. Theorem |21 is used when a compact subset of R Sl is used as the parameter 
space that identifies this ratio. Let us observe that neither case requires the 
parameter (s) that identifies the diffusion to be known. Thus, it is possible to 
assume that the latter parameter is known, when the estimation of the former 
parameters of the diffusion is made. 

For the next two corollaries, let us assume that the parameter space is one 
dimensional. 

Corollary 1. Let a 2 : S x K i-> [0,oo), and t: S x 1 h 1 be defined by the 
formula 

£a = V(-,A)^1+6(-,A)JL forXeR (19) 

where b(; A), cr 2 (-, A), (b/a 2 ){-, A) € C(S) n C 2 {S) for any A e R. We assume 
that d/dX(b/a 2 )(x, A) exists and is nowhere zero for (x, A) <E ^deoUd X R. 
Let H : R x D — > { — 1, 1} 6e defined by the formula 

H(X,d) = l {Q>oo) {d/d\{b/u 2 ){d,X)) - l(-oo,o)(<9/9A(fo/o' 2 )(rf, A)) (20) 

//A* £ R is a /(zed number, then the sequence of random variables defined 
by equation TO, converges almost surely to A* for any x E S, where and 
are defined as in equations 01) and and f = id is the identity on R. 

A few words are needed to review the hypotheses from Corollary If the 
drift is zero the sampling scheme defined by equation (JZJ) can not be used. In 
fact, only data obtained using the sampling scheme defined by equation JHJ 
would provide any information. See Corollary |21 below. Within the framework 
proposed this is indeed natural. If the drift is zero, it is conceivable that only 
the times between hits of the grids and the end points of the surrounding inter- 
vals should provide any information. If b/u 2 {d 1 •), d € D are strictly monotone 
functions around an interval containing the "true" parameter, then it is possi- 
ble to define a new parameterization that complies with the hypothesis of the 
previous corollary and allows us to identify the parameter at least from a small 
interval. Also, Thcorcm|2]can be used whenever b/a 2 (d, •), d € D are not strictly 
monotone. 

Corollary [21 below is used to estimate the parameter that identifies the dif- 
fusion, when the parameter space to identify this diffusion is R. See Theorem 
121 for estimation of parameters used to identify the diffusion term for a mul- 
tidimensional setting for the parameter space. It is assumed that the vector 
of parameter (s) that identifies the ratio between the drift and the diffusion is 
known. The previous assumption can be made in lieu of CorollaryUor Theorem 
121 in conjunction with the remarks made right after Theorem ^ 

Corollary 2. Let it 2 :Sx1h [0, oo), and b: S x R t-^ R be defined by the 
formula 

L < = r 2 ^& +b ^Tx for ^ R (21) 
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where &(•,?), cr 2 (-,<0, 6/c 2 (-,^) € C(S) D C 2 (S) for any ? 6l. VKe assume: 
1. There exists a function s: S ► R, f/iai does noi depend on <;, such that 
b(x c) 

V V = s(ar) /or ani/ieS,<;€l (22) 



I?. There exist o"o : R i— > R + , /i : 5 i— > K suc/i t/ia£ 

o" 2 (x, ?) = <7o(?)/&(:e) / or 0^2/ 1 £ S, 5 £ R (23) 

where we assume that O"o * s a strictly increasing function that is differen- 
tiable and 

Urn inf |?|<7 (0 > (24) 

If c* £ 1 is a /tied number, then the sequence of random variables defined 
by equation ififl)) converges almost surely to ^* /or any a; € S, where rj 9 
and V 9 are defined as in equations 0) and 0) for g the identity on R + and 
ff: R x D n {- 1)1} w i/ie constant function equal to 1. 

Let us review the hypotheses of Corollary |21 Equation l|22|l is natural under 
the assumptions made on the parameterization. See the remarks made after 
Theorem The factorization of equation (|23|l often arises in applications. 
The latter assumption is used to prove monotonicity of the function defined by 
equation i|43|) . The assumption made on equation (|24|) can be made without 
any loss of generality. 

In order to illustrate the use of stochastic algorithms for the problem of esti- 
mation in the multidimensional case (for the parameter space), we make use of 
the standard theorem of convergence for truncated stochastic algorithms with 
correlated noise with step size going to zero. Assume a constrain multidimen- 
sional parameter space H = {0: at < 9 % < &«}, — oo < a. L < 9 % < fa; < oo for 
1 < i < s. For 9 e H, define the set C(0) as follows. For £ H°, the interior of 
H, C(0) contains only the zero element; for 9 £ dM, the boundary of H, let C(9) 
be the infinite convex cone generated by the outer normals at 9 on the faces on 
which 9 lies. Given a continuous g: R s i— > R s the projected ODE of 9 — g(9) is 
defined to be 

6 = g(6)+z, 0(t) e -C(0(t)) 

where z(-) is the projection or constrain term, the minimum term needed to 
keep 0(-) in H. 

Theorem 2. Let (Y n ,0 n ) ne fq be a sequence of '(J>«) adapted measurable maps 
where Y n : {fl,T T J (R,B(R)), and 0„: (fi.JFrJ h-> (R s ,6(R s )). Let K a 
non singular sx s matrix. Assume that (Y n , n ) satisfies the following recursive 
relation: 

n+ x = Ilu[0n-7nKW(0 n ,X Tn )V(O n ,X Tn ,Y n+1 )] (25) 
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where Ilg is the projection onto i, 7: HxDxRi-^K is a measurable function, 
V: H x D i— > K is a £wice continuous differentiable function with differential 
W(-,d) for d ED, K is an invertible matrix and (j n ) is a decreasing sequence 
of positive numbers. We assume that 

E(V(G n ,X Tn ,Y n+1 ) | T Tn ) =V(O n ,X Tn ) forn>\ (26) 

Moreover, it is assumed that the sequence (7,1), 7„ | of positive numbers 
satisfies 

J2ln=00, ^ 7 2< 00. (27) 

n n 

Then the sequence (O n ) converges almost surely P to an invariant set of the 
projected ODE 

6 = -Kg(e)+z, 6(t) e -C(0(t))) (28) 

for 

s(#) = ^£p d vF 2 (M) 

den 

where p — (pj) is the left- fixed probability row vector for the Markov chain 
{X Tnl !F Tn ). Indeed, (<9„) converges almost surely to a unique compact and con- 
nected component of the set of stationary points of the equation \2fy) . If 9* is 
an asymptotically stable point of equation and (O n ) is in some compact 

set in the domain of attraction of 9* infinitely often with probability > p, then 
®n ~ * @* with at least probability p. 

The proof of the above theorem is a straightforward consequence of the 
Theorem of convergence with probability one for the correlated noise case for 
stochastic algorithms. See for example Kushner and Yin |44| . Theorem 6.1.1. 
The details of the proof are left to the reader. 

More general constrain sets can be consider. For instance, let qi{-), i = 
1, - • • ,p be continuously differentiable real- valued functions on R s , with gra- 
dients S7qi(-), where it is assumed that Vqi(x) 7^ if qi(x) = and that 
H = {x I qi(x) < 0,i — 1, ■ • • ,p} is a nonempty, compact connected set. 
Define C(x) to be the convex cone generated by the set of outward normals 
{S7qi(x) I qi{x) = 0}. Suppose that for each x the set {Vqi(x) \ qt(x) = 0} is ci- 
ther empty or a linear independently set. Then the Theorem [21 remains true 
with the obvious changes. See Kushner and Yin 01] • Similarly, if H is a R s_1 
dimensional connected compact surface with a continuous differentiable outer 
normal, and we define C(x), x e H, to be the linear span of the outer normal at 
x then Theorem[5]still holds. See also Kushner and Yin It is worth noting 
that the former constrain set, as well as the mentioned in the Theorem [21 can 
give rise to new stationary points of the ODE l|28|l . but this is the only type 
of singular point that can be introduced by the constrains. In many applica- 
tions when the truncation bounds are large enough, there is only one stationary 
point 9* of the ODE (|28l) that is globally asymptotically stable. Typically, 
for the kind of application we are heading, V(9,d) — rj(9*,d) ~ rj(6,d) where 
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r): H x D i— > W is a twice continuous differentiable function (on the parameter 
variable) and 9* e H°. If 



defines an injection, then 9* is the unique stationary point of equation (|28|) in 
the interior of H, at least for a sufficiently small neighborhood of 9*. 

As a application of Theorem [21 let us assume a family of scalar diffusions 

(X u f t , Px 1 ' Aa) ) (A t , AsQeHi x H 2 with differential operators (£(A 1 ,A 2 ))(A 1 ,A 2 )eH 1 xH 2 
given by equation (|18|l where Hi C M Si , i = 1,2 are constrain sets as the 
ones discussed above, for i — 1,2. It is assumed that there exist a parame- 
ter (A*, A|) S Hi x H 2 such that (P { x X " lX2) ) = (P x ). We consider the fam- 
ily of diffusions (X u T t , Pjfe*'* 55 ) AgHi- Let (0„) be defined by equation Q), 

f A A* 

where the projection in taken over the set Hi, ry (A,x) = E^ 1 2 /(I TDr(i)ATDi(i) ), 

W(A,x,2/) = f(y) - ?/(A,z), V f (X,x) = ^(AJ.z) - r)f(X,x), K is a non- 
singular matrix and (7„) is a sequence as in equation (|27|) . It follows that 
Theorem [21 applies, and it identifies Ai if this is the unique stationary point 
of the projected ODE (|28|l . We observe that the computation made to ob- 
tain the sequence (©„) does not depend on the value A 2 . See Appendix [U] 
for the computation of the algorithms. Next, we assume that the parame- 
ter Ai is known. (The latter can be assumed by the previous remark.) We 

consider the collection of diffusions (X t , J- t , Pi Al ' A ')a'6H 2 ■ Let (0„) be the 
sequence of estimators defined by equation (jTU|) where the projection is taken 
oven the set H 2 , rf{\', x) = E^ v g{r Br(x) At D|(x )), V 9 (9, x, y) = g(y) -if(0, x), 
V 9 (A' , x) = 7y 9 (A|, x) — ij 9 (X, x), K a non-singular matrix (not necessarily iden- 
tical to the one used to compute (©«)), then Theorem [2] applies, and it identifies 
A 2 if this is the unique stationary point of the projected ODE 128fl . 

It is worth noting that even if only data associated with the sampling scheme 
related with equation Q is available then at least, identification of the the ratio 
between the drift and the diffusion can be made. Also, when the dimension of 
the parameter space that identifies either the diffusion or the ratio between the 
drift and the diffusion are one-dimensional, Corollaries ^and[21 can be called for 
the estimation with the advantage that complete identification of the parameter 
is easier. 

4 Asymptotic Normality 

In this section we propose a version of the central limit theorem for the class of 
estimators of Theorem ^ 

For any stopping time r we denote as 9 T the measurable map defined as 
9 t {uj){-) = 9{t{uj) + ■). We observe that 9 Tn = and X Tn = X Tl o 9 Tn = 

X Tl o A™" 1 for n > 2. 
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For the following theorem we assume that (Xt^tjP x)s&R is a parametric 
family of recurrent strong Markov processes . 

Theorem 3. LetY: (Q,J- T2 ) — > (R, B(M)) be a measurable map that is bounded 
below. Moreover, assume that Y G HdeD L 2 (Pd)- Let Y n be defined as Y n = 
Y o 9 Tn _ 1 = Yo 9 r ^ 2 , forn>ZandY 2 = Y. Let r] : R x D -> R be the function 
defined as rj{9, d) = E?(Y). In addition assume that Hypotheses N\ and N 2 are 
satisfied: 

Ni For any d G D r](-,d) is a strictly monotone, twice continuous differ entiable 

function with non-vanishing derivative. 

N2 There exist L, L' > such that for any 9 G R, d G D 

\r)(9,d)-n(9*,d)\<L\6-6* \+L' (29) 

Define V : R x D x M — > R by the formula V(0, d,y) = y — n(9, d). Assume that 
0^ is a {T Tn ) adapted sequence of random variables that satisfies the recursive 
relation: 

n N _ n N 1 V (®n i X r n , Y n+ i) 
n a(X Tn ) 

where a{d) = -(dr)/d9)(6*,d) fordeD and E((9i) 2 ) < 00. 
Then n 1 ^ 2 (0^ r — 9*) is asymptotically normally distributed with mean zero and 
variance a 2 = J2det>Pd Var d(Y)/a 2 (d) = J2deT>Pd^d(Y -r)(9* , d)) 2 /a 2 {d) where 
P = (Pi) i s the left-fixed probability row vector of the Markov chain (X Tn , T Tn ) 
as in Lemma^] 

Corollary 3. Let n be a probability measure R supported on S , A* G R be a fixed 
constant, and let (f2,P ,J~oo) be the probability space where P = fP^ dfi(x). 
Let b,a 2 ,b/a 2 and (L\) be as in Corollary^ Let ?/: R x D — > R be defined as 
T)(\,d) = E^(X„ 2 ). We define V:lxDxI-iM by the formula V(\,d,y) = 
y — rj(X,d). Assume that (©„) is a (J>„) adapted sequence of random variables 
which satisfies the recursive relation 



n a(X Tn 



where a(d) = — (dr)/d\)(A* , d) for d G D and 0^ is a bounded random variable. 
Then n 1 / 2 (0^ r — A*) is asymptotically normally distributed with mean zero and 
variance a 2 = Y^denP^d (-^a — 7 ?(^*> d)) 2 /a 2 (d) where p — (pt) is the left-fixed 
probability row vector of the Markov chain {X Tn ,T T ^) as in Lemma^\ 

Corollary 4. Let fi be a probability measure on R supported on S , q* G R 
be a fixed constant, and let (f^P,^^) be the probability space where P = 
/P| dfJ,(x). We assume that b, a 2 , {L q ), s, <jq, and h satisfy the hypothe- 
sis of Corollary\^ Let i): R x D — > R fee defined as fj(<;, d) — F, d (v 2 ). W^e define 
V: RxDxi^i by the formula F(<r, d,y) =y- fj(<;, d). Assume that (6%) is 
a (J- n ) adapted sequence of random variables that satisfies the recursive relation 

<\N _ f\N 1 V{®ni X r n ,Vn+l - T n ) . . 
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where a(d) — — (dfj /&<;)(<;*, d) for d E D and Of is a bounded random variable. 
Then n}' 2 (@^ — <r*) is asymptotically normally distributed with mean zero and 
variance a 2 — EdeDP^d i^ 2 ~ d)) 2 /a 2 (d) where p = (pi) is the left-fixed 
probability row vector of the Markov chain [X Tn ,!F Tn ) as in Lemma^] 

In order to illustrate the use of stochastic algorithms, for the problem of 
asymptotic normality in the multidimensional case (for the parameter space), 
we use a standard theorem for the rate of convergence for stochastic algorithms 
with correlated noise and decreasing step size. We assume a constrain multi- 
dimensional parameter space H = < l < &;}, — oo < a, < l < b. t < oo 
for 1 < i < s. We assume that (Xt^tjP^eGM is a parametric family of recur- 
rent strong Markov processes with sample space (fl, Too). Let 9* G H be an 
interior point, and p is a probability measure supported on S. We denote as 
(X t ,^-" t ,P) the Markov process with parameter 9* and initial probability mea- 
sure fi. Let Y, Y n n > 2 be defined as in Theorem^ and define r\: H x D — > K 
as rj(9, d) — ~E 6 d (Y). We assume a recursive sequence of estimators (<9„) defined 
as, 

n + -K\/ V (O n ,X Tn )(Y n+1 - r)(G n ,X Tn )) 



e 



n+l 



(33) 



Let D (—00,00) (resp., D[0,oo)) denote the space of real-valued functions 
on the interval (—00,00) (resp., on [0,oo)) that are right continuous and have 
left hand limits, endowed with the Skorohod topology, and D s (—00,00) its s- 
fold product. Full descriptions and treatments of the Skorohod topology are 
given in Billingsley ^H] and Ethier and Kurtz [23 Define U n = ^/n(0 n — 9* ), 
and let U n {-) denote the piecewise constant right continuous interpolation (with 
interpolation intervals {l/n}) of the sequence {Ui,i > n} on [0,oo). Namely, if 
we define to = and t n = E"=i V*j we ma ke 

U n (t) = U m for t S [t m -i - t n -i,t m - t„_i) , and m > n > 1 

For t > 0, let m(t) denote the unique value of n such that t E [t n -i,t n ), and 
for t < set m{t) = 1. Define the continuous time interpolation W n (-) on 
(—00,00), for n > 1, by 

( m(t n +t)-l 

E ^(KVr)(6*,X Ti ){Y i+1 -r]{6*,X Ti ))) fort >0 



W n {t) = < 



E ^(KVr){9*,X Ti )(Y i+1 -r){9*,X Ti ))) for t < 

m(t„+t) 



(34) 

Theorem 4. Let Y , and Y n be defined as in Theorem^ Assume the algorithm 
given by equation Ij.y^l where K is a nonsingular symmetric positive definite 
matrix. Let 9* be an isolated stable point of the ODE Ij&Sjl in the interior of 
H, and assume that (<9 n ) converges almost surely to the process with constant 
value 9*. Assume that r/(-,d), defined as above for d E D are twice continuous 
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differ entiable functions. Assume that the Hessian matrix 

A = DU^Y J PMQ,d)-r 1 (e\d)f) 
dev 

is positive definite. Moreover, assume that the eigenvalues of the matrix KA 
are greater that 1/2 (in particular the matrix {—KA + 1 /2) is negative definite). 
Then, the sequence {U n {-), W n {-)) converges weakly in D r [0, oo) x D r (—00,00) 
to a limit process (U(-),W(-)), where W{-) is a Wiener process with covariance 
matrix 

V = Y,PdVar d {Y){Ksi{6*,d)){Kv(e\d)) 1 

d£T> 

and [/(•) a stationary process with 

U{t) = [7(0) + /* {-KA + I/2)U(s) ds + W(t) 
= I-00 ex P ii~ KA + V2)(* - *)) dW(s) 

The proof of the previous theorem follows in a straightforward manner from 
the rate of convergence theorem for stochastic algorithms with exogenous noise 
and decreasing step size. See for instance 01], Theorem 10.2.2. Details are 
left to the reader. A few words are needed to review the hypothesis of the 
previous theorem. Theorem [5] above, gives sufficient conditions for the almost 
surely convergence of the sequence (<9 n ). If 9* is a unique stationary point of 
the ODE 1)28(1 in the interior of H, and K, A are positive definite symmetric 
matrices then 8* is a stable point of the ODE. The latter follows from the basic 
theory of Dynamical systems. See for instance Perko |51| . It is worth noting 
that Theorem 0] requires the existence of a unique stationary point of the ODE 
in the interior of H. In the setting of Theorem a sufficient condition for the 
uniqueness in the interior of H is discussed after Theorem |21 For any matrix 
A, as above, it is clearly possible to find a symmetric, positive definite matrix 
"large enough" so that the eigenvalues of the matrix are greater that 1/2. 

Among the choices for K in equation (|33fl the asymptotic optimal covariance 
is achieved by K — A^ 1 . For this, the limit £/(■) satisfies 

dU = {-KA + 1/2)) Udt + KT} /2 dW 

where Wq is the standard Winner process. The stationary covariance is 

/ e ( -- KA+I ^ t K^K'e { -- A ' K ' +I ^ t dt 
Jo 

the trace of this matrix is minimized by choosing K = A^ 1 , which yields the 
asymptotically covariance A~ 1 J^{A')~ 1 . See Kushner and Yin 03 for a deeper 
discussion on the latter. In order to determine a choice of K that is optimal 
for the class of estimators proposed by equation l(^|l it is necessary to have a 
consistent estimator for the parameter 9* . This can be accomplished by initially 
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employing a not necessarily optimal estimator of the type mentioned above. As 
we mentioned earlier, in section |3 it is possible to make use of Theorem 0] for 
the estimation of the parameters related with the diffusion or the parameters 
related with the ratio between the drift and the diffusion. 



5 Examples 

In this section we show as an illustration the estimation for two examples from 
the financial literature. We choose the estimation for the one-dimensional pa- 
rameter space. Details for the multidimensional parameter space are left to the 
reader to fill out. 



5.1 CEV 

In this example we consider the estimation of the parameters for the constant 
elasticity of variance (CEV) process introduced by Cox J2| an d by Cox and 
Ross |T3]. The application of this process to interest rates is discussed in Marsh 
and Rosenfeld 

Let us assume that (X t , TtjP^'^)^.^)^ is a parametric set of diffusions 
on R with sample space (f2, Too). We assume that the differential operator of 
the diffusion (X t ,F t ,P ( x Xs) ) is given by the formula 

L AjS = i<7 2 (<r)z 2 ^ + ( M /a 2 )(A)a 2 ( ? )^ for A, ? eM 

where n/cr 2 : R i— > R + is a diffcrentiable function such that d(fj, / a 2 ) / d\ > 
and a 2 : R i— > R+ is a continuous differentiable function with d(a 2 )/d<; > that 
satisfies equation Q24JI. and 7 > 1 is a fixed known constant. Let (A*,<r*) be 
the true parameter of the process. Let D = {di, . . . , d s } C R be a finite set of 
real numbers where d% < ■ ■ ■ < d s . First we consider the collection of diffusions 
pQ^P^'^AeB- Let /: R -> R be the identity function, and let r] f ,V f be 
defined as in equations (2J| and © respectively. We suppress / in what follows. 
It follows that 

f x C xp( /j/ ' t2(a) v 2 - 2 ' 1 ) dv 

for x € D. Let (0„) and (6^) be the (T Trl ) adapted sequences of random 
variables defined as in equations JJJ) and (|31|l respectively, where H : R x D — > 
{1,-1} is the constant function taking value —1, a(d) = —(dr]/d\)(\*,d) for 
d e D and 81 is a finite T\ measurable random variable. We observe that the 
computation of the estimators 9„ does not depend on the value of <;* . 

It follows by Corollary H that the sequence of estimators (On) converges 
almost surely Pi A ' ? ' to A*. Let /1 be a probability measure on S and let 
(fi, Pj-Foo) be the sample space where P = Jp1 A s d(i(x). As a consequence 
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of CorollaryEl n 1 / 2 (0^ r — A*) is asymptotically normally distributed with mean 

zero and variance a 2 — YldeoPd^d ' ? C^a — >7(^*) d)) 2 /a 2 (d) where p = (pi) 
is the left-fixed probability row vector of the Markov chain (X Tn , T Tn ) as in 
Lemma ^ 

Next, we assume that the parameter A* is known. (The later can be assumed 
by the previous remark.) We consider the collection of diffusions (X t , T t , P^ A ) e em- 
We define rf and V 9 by equations (JSJl and © respectively where g is the iden- 
tity function. See Lemma[7]of Appendix B for a computation of these functions. 
We suppress g in what follows. Let (On), (6^) be the (J>„) adapted sequence 
of random variables as in equations (|1U|) and respectively, where Oi is a 

bounded JFi random variable, H : RxDn { — 1,1} is the constant function equal 
to 1 and &(■) — — (dfj/d(;)(<;*, •). It follows by Corollary [21 that for any x G R 
the sequence (0 n ) converges almost surely P? A ,s ' to ?*. It follows that the 
sequence of estimators (O n ) converges almost surely P^ A ' s ' to Next, it fol- 
lows by Corollary that n 1//2 (0^ — ?*) is asymptotically normally distributed 
with mean zero and variance a 2 — Ylde^P d d ^ 2 ~ ^fc*' d)) 2 /a 2 (d). 

5.2 Cox-Ingersoll-Ross 

In this example we consider the estimation of some parameters for the model 
of term structure of interest rates of Cox-Ingersoll-Ross. See Cox et al |T5] . 
We consider the problem of the estimation of the quotient between the "speed 
of adjustment" and the "volatility of the process" . See Cox et al ^3] f° r ex- 
planation of this terminology. Let us assume that (X t , -T 7 *, Ps A '^)(a ,?)gk 2 is a 
parametric set of diffusions on R with sample space (f2, J-^). Let (A*,?*) be 
the "true" parameter of the process. We assume that the differential operator 
of the diffusion (X t ,J-'t,Pi X '^) is given by the formula 

L\ s = ~a 2 (0 x-^ + (M/a 2 )(A)<r 2 (0(a - x)^ for A, 5 eM 

where a G R + is a given known constant, n/u 2 : R 1— * K + is a differentiable 
function with d(^,/a 2 )/d\ > and tr 2 : M 1— > R + is a continuous differentiable 
function with d(a 2 )/d<, > that satisfies equation (|24[) . Let D = {d±, . . . , d s } C 
(0, a) U (a, 00) be set of positive real numbers such that d\ <■■■<. d s . First, 
we consider the collection of diffusions (X t , Tt, P^' ? )agR- Let / : R — > R be the 
identity and let rf , V? be defined as in equations Q and © respectively. We 
suppress / in what follows. Moreover we assume that Ud C [0, a) U (a, 00) for 
d G D. Although a diffusion with differential operator as above is not a regular 
Markov process on R, we observe that if l/2a > (/i/(T 2 )(A) (l/2a < (n/a 2 )(X)) 
then the part of the process on [0, 00) (on (0, 00)) (see ^H], volume I for a 
definition of the part of a process on a subset of the state space) is a regular 
diffusion on [0, 00) (on (0,oo)) in the sense of definition 15.1 of Dynkin |19l 
volume II, p. 121]. For a discussion of this see for example Cox et al ^21- Either 
case follows from the analysis of the boundary classification criteria, see for 
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instance Gihman and A.V. Skorohod [2E|- In either case Condition ^ is satisfied 
if a, b £ (0, oo). It follows that 

n J (A,x) = {D r (x) - D;(x)| 'V, ) + Di(x) 

C ( i) ) 2 /- 2 ^ 2 )^exp(2( M /^)(A) 2/ )d/ 

for x <E D. Let Oi be a finite random variable, and let (9 n ) be the (J>„) 
adapted sequence of random variables defined as in equation (JJJ) where H : R x 
D — > {1,-1} is the function defined by the formula H{\,d) = l(-oo,a)(^) — 
l(a,oo)(^)- If A* e 1 is a fixed number it follows by Corollary 2] that the se- 
quence (0„) of random variables converges to A* almost surely P^ A ' 5 ' for any 
x G (0, oo). Next, we assume that the parameter A* is known. We consider 

the collection of diffusions (Xt,.F t ,Px ' ? ') s gr. We define rf and V 9 by equa- 
tions (JSJ) and 10 respectively, where g is the identity function, D C (0,oo) and 
Ud C (0, oo). Let (O ra ) be a sequence of random variables as in equation IjlOJI 
where Oi is a bounded random variable and H : ExDh { — 1, 1} is the constant 
function equal to 1. It follows by Corollary [21 that for any <,* S K., the sequence 
(O n ) converges almost surely Pi A ' s ^ to <;* for any x S (0, oo). 

Similar considerations can be made if we want to estimate the "central loca- 
tion" or "long term value" of the process and the diffusion coefficient. See Cox 
et al for a explanation of this terminology. 

Last, we notice that as a consequence of Corollary 01 and Corollary 01 asymp- 
totic normality of the appropriate normalization of the estimators constructed 
for the Cox-Ingersoll-Ross model can be obtained. The details are left to the 
reader. 



6 Conclusion 



The thrust of this paper has been to introduce the ideas of stochastic algo- 
rithms to the problem of the estimation of parameters of a continuous diffusion 
process using observed discrete data. The later could be potentially useful for 
the study of non time-homogeneous diffusion processes. Besides, we have pro- 
posed sampling schemes that depends on space discretization rather than time 
discretization. These sampling schemes are closer to the Markov character of dif- 
fusion processes. We also propose a new parameterization of diffusions that we 
believe is closer in spirit to the initial attempts made in probability to describe 
a diffusion by its "road map" and "speed". The main results given here (con- 
struction of sequences of estimators, asymptotic consistency of such sequences, 
and asymptotic normality of such sequences) as well as the two examples taken 
from Mathematical Finance dealt with families of diffusion processes that have 
a one-dimensional state space and a multidimensional parameter space. 

Future questions will center on the generalization of the current techniques 
for use in the case of a multi-dimensional state space and the development for 
the current setting of stochastic algorithms appropriate to the description of 
non time- homogeneous diffusions. A particularly interesting question is to find 
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sufficient conditions on multi-dimensional parameterizations of diffusion oper- 
ators that guarantee identification of the corresponding process from moment 
conditions of the type presented in this paper. Another direction of research 
can be centered on the the effective computation of the stochastic algorithms 
presented here and its comparison with other techniques. 
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A Appendix. Proofs 

Proof of Theorem QJ If we define T n — Q n — 9* then equation 1(11(1 becomes 
T n+1 = T n - j n H(G n , Y n )V(O n , Y n , X n+1 ) 

It follows that 

E(||T n+1 || 2 | T n ) - \\T n \\ 2 = -2j„T n ■ H(9 n ,Y n )V(G n ,Y n ) 

+ 7 lS 2 (O n ,Y n ) < 7^(1 + ||T„|| 2 ) (35) 

where the last inequality follows by equations 1)14(1 and ((16(1 . Moving the terms 
that have either \\T n \\ or ||T n+ i|| to the left of the previous equation we obtain 

E(||T n+1 || 2 | T n ) - ||T„|| 2 (1 + K-£) < Kj 2 n (36) 

Define H„ = Uk^ii 1 + K ll) and let T « = (rTT^ T "- We observe that the 
sequence (II„) is a convergent sequence of positive numbers since (logII n ) con- 
verges by Hypothesis H3. Using equation ((36(1 we obtain 

E(||T; i+1 || 2 |^)-||^|| 2 <if-^ (37) 

If F n = {uj e n I E(|jT; +1 || 2 | T n ) - ||T;j| 2 > 0}, then equation ® and 
Hypothesis H3 imply that 

00 

^E(i F „(||r; l+1 || 2 -||r;j| 2 ))<oo 

n=l 

It follows by the almost sure convergence of quasi-martingales that T n converges 
almost surely toward a positive integrable random variable (see, for example, 
Theorem 9.4 page 49 and Proposition 9.5 of Metivier |2Z|). We conclude that 
the same property holds for T n . The next step of the proof is to prove that the 
convergence of T n is to zero. By inequality ((37(1 . Hypothesis H3, the definition 
of T n and the fact that 6q belongs to L 2 (P), it follows that 

supE(||T„|| 2 ) < 00 (38) 

n 

We also observe that 

00 

< J2 27«E(T„ • H(O n , Y n )V(0 n ,Y n )) 

n=l 

00 00 

< E(I|T„|| 2 ) - E(||T„ +1 || 2 ) + (1 + su P E(||T fc || 2 )) £ if 7 2 < 00 
— : k>o — ; 

n—l — n—l 
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where the last inequality follows by equations i|35|). iJSSJl, and Hypothesis H 3 . 
Since 7„ fc = oo there exists a subsequence (n' k ) of (rifc) such that 

Urn E(T„/ • (0 n , , F„ ; )F(0„ ; , y ni ) = (39) 
equation l|39|) implies that for any s > 0, liminf/c \\T • || < £ almost surely. To 

k 

prove this last statement, let us assume otherwise. Then there exists e > such 
that T n > k > e for all k big enough on some set A of probability greater than zero. 
It would follow by Fubini's theorem and equation (jT5|l that there exists <5 > 
such that 

J T K ■ H(0 K , Y K )V(O k , Y K )dP 
h 

> J T K ■ H(0 K ,Y K )V(e K ,Y n , k )dP > JsdP> SP(A) > 

A A 

for all k big enough. The former is in contradiction with equation (|39(l . Since 
(T n ) converges almost surely, then T n — > almost surely. □ 

Proof of Corollary Q Let /jbea probability measure on R supported on S, and 
let P = J dfj,. We define V : R x D i-> M and S" 2 : R x D i-> R by the following 
formulas: 

V(\,d)=r) f (\*,d)-T] f (\,d) (40) 
S 2 (A, d) = (r/(A*,<i) - rj f (X, d)) 2 + ( V f2 (X*,d) - (rj f (X* ,d)) 2 ) (41) 

It follows by the strong Markov property of (X t ,J-t,Px ) that Conditions Ai 
and A2 of Theorem ^ are satisfied. Since rjf (■, d) for any d £ D is bounded and 
D is finite it follows that Property H2 of theorem^^s satisfied. By Corollary H3 
of Appendix B, equation (|14l) of Thcorcm^s satisfied. Last, we notice that 

s 

E((A- X*)H(X,X Tn )V(X,X Tn )) = Y,( X ~ X*)V(X,d m )P(X Tn = d m ) (42) 

711 = 1 

By the last equation and Corollary of Appendix A, equation (f 1 f>p holds. It 
follows by Theorem that the sequence of random variables (O n ) converges 
almost surely P to A* . Since the last statement holds for any initial probability 
measure supported on S the result follows. □ 

Proof of Corollary [U Let fi be a probability measure on R supported on S, and 
let P = / P% dfj,. If we define V: R x D ^ R and S 2 : R x D ^ R by: 

tM)^V,d)-^'M) (43) 

s 2 M) = d) - ^M)) 2 + (^V,*) - x)) 2 ) (44) 
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It follows by the strong Markov property of (Xt,J 7 t,'P% ) that Conditions A\ 
and Ai of Theorem ^ are satisfied. By Assumptions ^ an d [21 and Lemma [7] 
of Appendix B, it follows that Property Hi of Theorem ^ is satisfied. The 
monotonicity and differentiability of o~q, Property[21of Corollary [21 and Lemma0 
of Appendix B imply equation (|14|) of Theorem 2] Last, we notice that 

s 

E{(<;-<;*)H{<;,X Tn )V(<;,X Tn )) = ^ («■ - <;*)¥{<;, d m )P(X Tn = d m ) (45) 

m— 1 

By the last equation and Corollary of Appendix A, equation Ijl5|l holds. It 
follows by Theorem ^ that the sequence of random variables (Q n ) converges 
almost surely P to ?* . Since the last statement holds for any initial probability 
measure supported on S the result follows. □ 

The following lemma is used in the proof of Theorem [21 Theorem 0] and 
Theorem [3] 

Lemma 1. Let /: D — > M and A — {ai.j) be the irreducible transition matrix of 
the Markov chain (X Tn ,-!F Tn ) with left-fixed probability row vector p = (pj) >> 0. 
Then for any x G S 

n 

n- 1 ]^/^)^^/^ a.s.P x (46) 
k=i den 

Remark 1. A is a irreducible matrix. This follows by the recurrence of the 
Markov chain {X Tn , T Tn ) . Hence, there exists a unique left- fixed probability vec- 
tor p; moreover, the entries of p are strictly positive. See for instance Petersen 

Proof. Let P^ for d S D be the restriction of P^ to a(X Tl , X T2 , ■ • ■ ) and let 
P be the probability measure defined on o~(X Tl , X T2 , • • • ) by the formula P = 
12dGDPdPd- By the strong Markov property 9 T2 defines a measure-preserving 
transformation. Indeed, (X Tn ) is an irreducible Markov shift. It follows by 
the point-wise ergodic theorem and the fact that an irreducible Markov shift is 
ergodic that equation (|46|l holds a.s. P (See Petersen |S2|); the result follows 
using the strong Markov property and the fact that the components of the 
invariant vector p are positive. □ 

Theorem 5. Let Z : (fl,J- T2 ) — > (K, B(M)) be a measurable map that is bounded 
below. Moreover, assume that Z € P| dgD L 2 (P^) and E^(Z) = for d G D. Let 
Z n be defined as Z n = Z o B Tn , for n > 2 and Z\ = Z. Then for any initial 
probability measure, the distribution of 

n 

n- 1/2 5>> (47) 
fc=i 

approaches the normal distribution with mean zero and variance a 2 — 
^2d GI> PdEid(Z 2 ) where p — (pi) is the left-fixed probability row vector of the 
Markov chain (X Tn ,T Tn ) as in Lemma^\ 
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Proof. Let \i be a probability measure on S, and let (X t , JF t , P M ) be the Markov 
process with initial probability measure fj,. We observe that (Z n ) is a (Jv„ + i) 
adapted process and 

^(Zn | T Tn ) = (48) 
E^J-E^JZ 2 ) <oc (49) 

for n > 1. The proof of the theorem follows using the strong Markov property, 
Lemma ^ and a line of argument similar to the technique of the proof of the 
Lindeberg-Levy theorem for martingales (See Billingsley |8]). □ 

For the proof of Theorem[j|]we make use of the following easily proved lemma. 

Lemma 2. Let {(3k,n | < k < n} be the double indexed sequence of positive 
numbers defined as (3k,n = ~ I/*)- Then 

(1 - e fe )- <p hn < (1 + e*)- (50) 
n n 

where (e^) is a sequence of positive numbers such that ej. — » as fc — > oo. In 
particular n 1 / 2 ^^ — > as n -> oo /or any /ixed fc. 

Proof of Theorem^ We observe that the sequence of random variables (0„ ) 
converges almost surely P to 6* by Theorem |2 Let V: MxD^l and 5 2 :Mx 
D^Ibc defined by the formulas: 

V(d,d)=r)(6*,d)-T](6,d) 

s 2 (9, d) = ( v (9* , d) - v {6, d) f + E d (r - n(6* , d)) 2 

By Hypothesis iVi it follows that for 9 e R, d € D, 

_7 (M) >o 

V ; (5y/56')(6'*,d) " 

Hypothesis iV2 and the fact that a is defined on a finite set and is nowhere 
zero imply that S 2 /a 2 satisfies Hypotheses H2 of Theorem ^ Using the strong 
Markov property and an argument similar to the one used in the proof of Corol- 
lary^we can prove that (0^ ) converges almost surely to 9* . Let Z : Dxl^l 
be defined as Z(d,y) = y - r)(6*,d). We observe that Z(d,y) = V(9,d,y) - 
V(9,d) for any 9 € R. We denote as 5: R x D R the function defined by the 
formula V(0,d) = {d/d9)V{9* ,d)(9-9*)+5(9-9* ,d). If we define = 0%-9* 
for n > 1, it follows that 

T r f +1 = (l-V,f- — -— (51) 
n na n na n 

where Z n = Z(X Tn ,Y n+ i), 5 n = S(T„,X Tn ) and a n = a(X Tn ). Iteration of 
equation (|ST|l yields 

fe=l K k=l K 
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Hence, we can prove that n x l 2 T^ is asymptotically normal with mean zero and 
variance a 1 , by proving that 

n 1/2 Po, n T^ -> almost surely, 

n i/2 y P±n_ h_ ^ Q . n probabmty) rm^ 
ti k ak 

n X/2y { ^n _ 1 Z(X Tk ,Y k+1 ) ^ Q in probabiUty; ^ 

* — ' k n a k 



k=l 



n -i/2 y Z{X Tk ,Y k+1 ) ^ ^ . n distribution ^ ™ 
fc=l K 

Equation (|52fa ) follows by Lemma[21 Next, we observe that the terms on the left 
hand side of equation are uncorrelated. We observe that Y £ f] del) L 2 (Pd) 
and a is a nowhere zero function defined on a finite set. It follows by the strong 
Markov property, and Lemma |21 that there exists a constant C > such that 

E( „v^ ( ^_I )( ffi^i) ))2 <^ 4 m 

k=l y k! k=l 

where the right-hand side of equation goes to zero as n — > oo since e k — > 
as k — > oo. Convergence in equation (|52b ) follows by Chebyshev's inequality. 
Next we prove the convergence of equation (|52*b ) . We observe that equation ijlTFjl 
and the proof of Theorem pimply 

limsup7iE(T^) 2 < oo (54) 

n 

Let ei,e2 > 0, and let S(-) = max ( j eD (5(-, d)/a{d)). Since S(x) = o(\ x |) there 
exists e' > such that 

I 6{x) \< t\ | x | for | a; |< e' (55) 
Since — > almost surely, there exist Ni > such that 

P(|Tf|<e',fc>Ar 1 )>l-e 1 (56) 
It follows using equation the triangle inequality, equation JSBJ), Markov's 
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inequality, and Lyapounov's inequality that 
k a k 



p(|n i/ 2 y |>£a) 



k=Ni 



<e 1 + P(\nV 2 J2 ^^- k \>e2,\T k N \<e',k>N 1 ) 



k=Ni 



< £l +P(^ 1 /2 ^ ^ ]T N >€2) (5?) 
fc=iVi 

<e 1 + e 2 E(n^ ^ ^ | T f |) 

fe=jVi 

The convergence in probability in equation i|52b) follows using equation l|57|l 
and Lemma |21 Finally, we observe that the convergence in distribution of equa- 
tion i|52t0 is a consequence of Theorem [5] □ 

Proof of Corollary^ We observe that X Vn = X U2 o 9 Trl _ 1 — X Vl o #™~ 2 for 
n > 3. Indeed, rj satisfies Condition Ni of Theorem 01 by Corollary H3 and by 
the definition of r/ Condition N2 is satisfied. The result is a straightforward 
consequence of Theorem □ 

Proof of Corollary^ We observe that v n — r„_i = ^2 #t„_i = ^2 0? 2 f° r 
n > 3. By Lemma [7] 77 satisfies Condition iVi of Theorem [21 and fj satisfies 
Condition N2 by Lemma The result is a straightforward consequence of 
Theorem E □ 

B Appendix 

In this appendix we derive some technical results about the transition matrices 
of the Markov chain (X Tn ,•?>„)■ In this section all the matrices are stochastic 
matrices. We state some easily proved results. The proof is left to the reader. 



Definition 1. We say that a matrix A of size s X s is of type I if for all 
i, j G {1 . . . s}, i = j mod s implies a^ j — . We would say that a matrix A 
of size s x s is of type II if whenever i = j + 1 mod s implies dij — for all 
i, je {!...«} 
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Lemma 3. let A and B be two s x s matrices. Then the following holds: 
If A and B are both matrices of type I then AB is a matrix of type II. 
If A and B are matrices of type II then so is AB. 

If A is a matrix of type I and B is a matrix of type II then AB and BA are 
matrices of type I. 



Definition 2. Given a matrix A of type II we define P even {A) and P dd{A) to 
[f] x [§], and size x [4£2 



be the matrices of size [§] x [|], and size [^-] x [^-] respectively denned by 



the following formulas 

(P even (A)) itj = a 2 i,2j for i,j G {1. . . [-]} 

s + 1 

(-P oe id(^))i,j = «2i-i,2j-i for i, j G {1 ... [— — ]} 

Lemma 4. and B are s x s matrices of type II then the following property 
holds: 

Peven{,A')P even ( K B^ — P even ( K AB^j 

Podd(A)P odd {A) = P od d(AB) 
In particular for any n positive integer 

Peven(A ) = (-Pe^en(^)) 

Podd(A n ) = {P odd {A)) n 

It is obvious that a matrix A of type II is completely determined by P dd(A) 
and P even (A). 



Lemma 5. Let A = (dij) be an s x s matrix whose entries are non-negative 
and such that dij ^ for \ i — j \< 1. Then for any positive integer n, A n is a 
matrix such that a"j ^ for \ i — j \< n where A n — 

We observe that if A is the transition matrix of the Markov process {X Tn , T Tn ) 
then P dd(A 2 ) and P even (A 2 ) satisfies the condition of the previous lemma. 

Corollary 5. If A is an sx s transition matrix of a Markov process (X Tn ,J- Tn ), 
then P dd(A 2n ) and P even {A 2n ) converge to stochastic matrices A\ and A 2 . In- 
deed, there are matrices C\ and C 2 where C\ is a 1 x \ S Q\ matrix and C 2 is 
fl 1 x [I] matrix such that A\ = (1, . . . , l)'Ci and A 2 = (1, . . . . l)'C 2 and the 
components of C\ and C 2 are positive. 

Proof. The result follows from Lemma^J Lerrima[S], the observation made right 
after the proof of Lemma |S] and the fundamental theorem for regular Markov 
chains. See for example Kemeny and Snell Theorem 4.1.4. □ 



C Appendix 



s(x) = I exp{- / -^-^dz}dy (59) 



In this appendix we state and prove some results that are needed for the actual 
computation of the moments required for the construction of the algorithms 
proposed. Let us assume that (Xt^tF x) is a regular diffusion on S, where S 
is a interval of R. We assume that the differential operator L of the diffusion is 
given by 

where u 2 : S ^ R + , b : S h- ► R satisfy Condition [21 Moreover we assume that 
6/cr 2 G C([c, d])nC 2 ((c, o?) where c, d G S and c < d. It follows that s: [c, d] — > R 
defined by 

r y 2b(z) 
a^z) 

belongs to C([c, d]) fl C 2 ((c, d)). It is an elementary exercise to verify that s 
satisfies the equation Ls = 0, with initial condition s(c) = 0. It follows by 
Theorem 13.16 volume II of Dynkin ^Uj that 

{f(d) - /(c)}4^ + f(c) = E*/(X ToAr J) (60) 
s{d) 

Let A be an interval of R. Assume that (X tl Tt 1 P^)asa is a parametric set of 
diffusions on R, with sample space (f2,-Foo) an d differential operators (La)agA, 
where La is given by the formula 

L A = ^(*,A)^ + 6 (x,A)^ (61) 

Here we assume that 6(-, A) and cr 2 (-, A) satisfy the hypotheses of this appendix 
where as before c < d belongs to S are fixed constants. We wish to find con- 
ditions on b,a 2 to guarantee that the function given by A i— ► E^.f(X TcATd ) is 
monotone decreasing (or increasing). For this we prove the following lemma: 



Lemma 6. Let c < d be real numbers and let A be a closed interval o/R. Let 
f : [c,(f|xAn (0, oo ) be a jointly continuous positive function such that df/dX 
is also jointly continuous. Let us assume that 

— (x, A) = f(x,X)g{x,X) 

where g is a strictly increasing (strictly decreasing) function in x for each A G A. 
It follows that the function 

Mx ' A) f:f(y,X)dy 

is a strictly increasing (strictly decreasing) function in A for any x G [c, d]. 
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Proof. We prove g strictly increasing in x implies that h is a strictly increasing 
function in A. (The proof of g strictly decreasing implies that h is strictly 
decreasing is similar.) By the dominated convergence theorem 

dh y _ H f(y, A) dy £ §L (y, A) dy - £ f(y, A) dy JJ % (y, A) dy 
0X [X ' ' {j:f(y,X)dyy 

The result follows using the following inequalities: 

{J f(y, A) dy)g(x, A) < J ^(y, A) dy < (J f(y, A) dy)g(d, A) 
(J* f(y, A) dy)g(c, A) < J' ^ (y, A) < ( jf /(y, A) dy)^, A) 

□ 

Corollary 6. Lef (AT f , J-j, P^)^eA &e a parametric set of diffusions with differ- 
ential operators (L\)xeA as ^ n equation \61\l . Assume that b(-,\) and a 2 (-,X) 
satisfy the hypotheses of this appendix where as before c < d in S are fixed con- 
stants. Let r\ : [c, d] x A i— > M. be defined by the formula r)(x,X) = E^/(X TcATd )- 
If d/dX(b/a 2 )(x, A) > (<)0 for all (x,X) e [c,d] x A then rj(x,X) is a strictly 
decreasing (increasing) function on X for all x G [c, d] 

Proof. The result follows by Lemma Eland equation Q61J1. □ 

Finally we mention a result that allows us, in the case of diffusions with an 
one-dimensional state space, to compute the expected values of exit times from 
open sets. 



Lemma 7. Let (X t ,TtfP x ) be a regular diffusion on S, where S is an interval 
of K. We assume that L is the differential operator of the diffusion where L is 
defined by equation 15<Sj) and we assume that a 2 and b satisfy the hypothesis of 
this appendix. Set 



<p(x) = exp{- / —— dz\ 
Jc 



Th 



en 



f x f v 1 

EzT c A T d = i](x) = - / 2(p(y) / dzdy 

Jc Jc (T J {z)(p{z) 



f x tp( z )dz r d , N fv i 

My) / 07 ; ? , dzdy (62) 



f*<p(z)dz Jc Jc v 2 (zMz) 

Moreover u{x) = E x (t c A Td) 2 < oo for any x € [c, d] and u is the solution of 
the differential equation 

Lu = —77 (63) 
with boundary conditions u{c) = u(d) = 
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Proof. Equation lt)2|) follows by Theorem 13.16 volume II of Dynkin |T!5] and a 
straightforward computation. The later part of the lemma follows by Theorem 
1.15.3 of Gihman and A.V. Skorohod [28]. □ 
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